Data construction for phosphorylation site prediction

نویسندگان

  • Haipeng Gong
  • Xiaoqing Liu
  • Jun Wu
  • Zengyou He
چکیده

Protein phosphorylation is one of the most pervasive post-translational modifications, regulating diverse cellular processes in various organisms. As mass spectrometry-based experimental approaches for identifying phosphorylation events are resource-intensive, many computational methods have been proposed, in which phosphorylation site prediction is formulated as a classification problem. They differ in several ways, and one crucial issue is the construction of training data and test data for unbiased performance evaluation. In this article, we categorize the existing data construction methods and try to answer three questions: (i) Is it equivalent to use different data construction methods in the assessment of phosphorylation site prediction algorithms? (ii) What kind of test data set is unbiased for assessing the prediction performance of a trained algorithm in different real world scenarios? (iii) Among the summarized training data construction methods, which one(s) has better generalization performance for most scenarios? To answer these questions, we conduct comprehensive experimental studies for both non-kinase-specific and kinase-specific prediction tasks. The experimental results show that: (i) different data construction methods can lead to significantly different prediction performance; (ii) there can be different test data construction methods that are unbiased with respect to different real world scenarios; and (iii) different data construction methods have different generalization performance in different real world scenarios. Therefore, when developing new algorithms in future research, people should concentrate on what kind of scenario their algorithm will work for, what the corresponding unbiased test data are and which training data construction method can generate best generalization performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Real-Time Building Information Modeling (BIM) Synchronization Using Radio Frequency Identification Technology and Cloud Computing System

The online observation of a construction site and processes bears significant advantage to all business sector. BIM is the combination of a 3D model of the project and a project-planning program which improves the project planning model by up to 6D (Adding Time, Cost and Material Information dimensions to the model). RFID technology is an appropriate information synchronization tool between the...

متن کامل

A New Machine Learning Approach for Protein Phosphorylation Site Prediction in Plants

Protein phosphorylation is a crucial regulatory mechanism in various organisms. With recent improvements in mass spectrometry, phosphorylation site data are rapidly accumulating. Despite this wealth of data, computational prediction of phosphorylation sites remains a challenging task. This is particularly true in plants, due to the limited information on substrate specificities of protein kinas...

متن کامل

Rough sets theory in site selection decision making for water reservoirs

Rough Sets theory is a mathematical approach for analysis of a vague description of objects presented by a well-known mathematician, Pawlak (1982, 1991). This paper explores the use of Rough Sets theory in site location investigation of buried concrete water reservoirs. Making an appropriate decision in site location can always avoid unnecessary expensive costs which is very important in constr...

متن کامل

NetPhosYeast: prediction of protein phosphorylation sites in yeast

UNLABELLED We here present a neural network-based method for the prediction of protein phosphorylation sites in yeast--an important model organism for basic research. Existing protein phosphorylation site predictors are primarily based on mammalian data and show reduced sensitivity on yeast phosphorylation sites compared to those in humans, suggesting the need for an yeast-specific phosphorylat...

متن کامل

Musite, a tool for global prediction of general and kinase-specific phosphorylation sites.

Reversible protein phosphorylation is one of the most pervasive post-translational modifications, regulating diverse cellular processes in various organisms. High throughput experimental studies using mass spectrometry have identified many phosphorylation sites, primarily from eukaryotes. However, the vast majority of phosphorylation sites remain undiscovered, even in well studied systems. Beca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Briefings in bioinformatics

دوره 15 5  شماره 

صفحات  -

تاریخ انتشار 2014